Speculative Clustered Caches for Clustered Processors
نویسندگان
چکیده
Clustering is a technique for partitioning superscalar processor’s execution resources to simultaneously allow for more in-flight instructions, wider issue width, and more aggressive clock speeds. As either the size of individual clusters or the total number of clusters increases, the distance to the first level data cache increases as well. Although clustering may expose more parallelism by allowing a greater number of instructions to be simultaneously analyzed and issued, the gains may be obliterated if the latencies to memory grow too large. We propose to augment each cluster with a small, fast, simple Level Zero (L0) data cache that is accessed in parallel with a traditional L1 data cache. The difference between our solution and other proposed caching techniques for clustered processors is that we do not support versioning or coherence. This may occasionally result in a load instruction that reads a stale value from the L0 cache, but the common case is a low latency hit in the L0 cache. Our simulation studies show that 4KB, 2-way set associative L0 caches provide a 6.5-12.3% IPC improvement over a wide range of processor configurations.
منابع مشابه
Thesis - Vasileios Porpodas
Very Long Instruction Word (VLIW) processors are wide-issue statically scheduled processors. Instruction scheduling for these processors is performed by the compiler and is therefore a critical factor for its operation. Some VLIWs are clustered, a design that improves scalability to higher issue widths while improving energy efficiency and frequency. Their design is based on physically partitio...
متن کاملThe Increment Predictor for SpeculativeMultithreaded
|The speculative multithreading paradigm (speculative thread-level parallelism) is based on the concurrent execution of control-speculative threads. The eeciency of microarchitectures that adopt this paradigm strongly depends on the performance of the control and data speculation techniques. While control speculation is used to predict the most effective points where a thread can be spawned, da...
متن کاملParallel Pull-Based LRU: A Request Distribution Algorithm for Clustered Web Caches Using a DSM for Memory Mapped Networks
The SIRAC laboratory has developed SciFS, a Distributed Shared Memory (DSM) that tries to benefit from the high performances and the remote addressing capabilities of the Scalable Coherent Interface (SCI) memory mapped network. We use SciFS for high performance cluster computing but we now experiment with it to build large scale clustered web caches. We propose Whoops! a clustered web cache pro...
متن کاملThread-Spawning Schemes for Speculative Multithreading
Speculative multithreading has been recently proposed to boost performance by means of exploiting thread-level parallelism in applications difficult to parallelize. The performance of these processors heavily depends on the partitioning policy used to split the program into threads. Previous work uses heuristics to spawn speculative threads based on easily-detectable program constructs such as ...
متن کاملCluster Level Multithreading for VLIW Processors
Clustered VLIW embedded processors have become widespread due to benefits of simple hardware and lowpower. However, the ILP inmost of the applications today is limited and discourages the design of wider issue processors. Simultaneous MultiThreading (SMT) is a well known technique to improve the resource utilization by exploiting thread level ILP. However, implementing SMT is not feasible for e...
متن کامل